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The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in 
a previously unknown experimentally generated data set. The challenge was presented to the community in the frame- 
work of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate 
the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring 
fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same 
genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of S3 
different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for 
low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced 
a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that 
aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. 
Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of 
promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning ap- 
proaches, can be improved by the addition of biological features such as transcription factor binding sites. 



[Supplemental material is available for this article.] 

One of the main objectives of the Dialogue for Reverse Engineering 
Assessments and Methods (DREAM) (Stolovitzky et al. 2007) is to 
catalyze the interaction between experiment and theory in systems 
biology particularly for quantitative model building. For this purpose, 
unpublished data is used to objectively test team predictions gener- 
ated by their methods/algorithms. The evaluation of participants' 
methods is blind, as inspired by the community challenges posed in 
CASP (Critical Assessment of Techniques for Protein Structure Pre- 
diction). CASP's main goal is to obtain an in-depth and objective as- 
sessment of state-of-the-art techniques for protein structure prediction 
using a set of unpublished protein structures (Moult et al. 1995; 
Shortle 1995; Moult 1996). This same principle is used in DREAM 
where a blind benchmark is provided so predictions from different 
algorithms can be easily compared, thus enhancing the reliability of 
programs/methods used. We describe here the Gene Promoter Ex- 
pression Prediction challenge from DREAM6, identify the best per- 
formers, and discuss the main results, as well as an improvement of 
the top-performing algorithm. The full description of the challenge, as 
was presented to the participants, including the teams' rankings, can 
be found at the DREAM website (http://the-dream-project.org). 

Gene Promoter Expression Prediction challenge 

The level at which genes are transcribed is determined in large 
part by the DNA sequence upstream of the gene, known as the 
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promoter region. Although widely studied, we are still far from 
a quantitative and predictive understanding of how transcrip- 
tional regulation is encoded in ds-regulatory elements of gene 
promoters (Kaplan et al. 2009; Sharon et al. 2012). One obstacle in 
the field is obtaining accurate measurements of transcription de- 
rived from different promoters. Fusion of promoters to fluorescent 
reporters can be used to determine the relative contribution of 
transcription to the resulting mRNA levels, since they provide 
measurements of promoter activity independent of the se- 
quence of the associated transcript (Kalir et al. 2001). To further 
address this, an experimental system was designed to measure 
the transcription derived from different promoters, all of which 
are inserted into the same genomic location upstream of a re- 
porter gene — a yellow fluorescence protein gene (YFP) (Zeevi et al. 
2011). 

To study a set of promoters that share many regulatory ele- 
ments and thus are suitable for computational learning, data per- 
taining to promoters of most of the ribosomal protein (RP) genes in 
yeast Saccharomyces cerevisiae grown in a rich medium condition 
was obtained (Zeevi et al. 2011). Although ribosomal promoters 
may not capture generic promoter features, the challenge pre- 
sented sought to model RP promoters to address questions left 
unanswered by successful genome-wide models (Beer and Tavazoie 
2004; Gertz and Cohen 2009; Irie et al. 2011), such as what are the 
mechanisms behind the equimolar expression of the RP genes 
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despite their varying copy numbers and how the information for 
fine-tuned expression is encoded in promoter regions. Also, un- 
derstanding the basis of fine-tuned regulation of highly homolo- 
gous promoters could provide clues to engineer promoter libraries 
of desired activity starting from a parent promoter sequence. 

The promoter regions for the S. cerevisiae RP genes were de- 
fined as the sequence immediately upstream of the ribosomal 
protein coding region beginning at the translation start site (TrSS) 
and continuing 1200 bp or until reaching another upstream 
gene's coding sequence, selecting whichever came first. This 
removes a source of variability between strains derived from post- 



transcriptional regulation related to the coding and 3' un- 
translated regions. Each promoter was linked to a URA3 selection 
marker (Linshiz et al. 2008) and inserted into the same fixed lo- 
cation in the yeast genome (Gietz and Schiestl 2007) of a master 
strain that contained the YFP gene (see Fig. 1A). In addition to 1 10 
natural RP promoter strains, we constructed 33 strains with site- 
specific mutated RP promoters using similar methods (Gietz and 
Schiestl 2007; Linshiz et al. 2008). 

The strains containing the different RP derived promoters 
were synchronized and grown, and their YFP fluorescence was 
recorded in a plate reader. The transcription initiated by each 






Teams 



Figure 1 . Overview of the experimental system and results. (A) Illustration of the master strain into which we integrated all the tested promoters. At a fixed 
chromosomal location, the master strain contains a gene that encodes a red fluorescent protein (mCherry), followed by the promoter for TEF2, and a gene that 
encodes for a yellow fluorescent protein (YFP). Every tested promoter is integrated into this strain, together with a selection marker, between the TEF2 promoter 
and the YFP gene. (B) Strains with different promoters have highly similar growth rates. Shown is the growth of 71 different promoter strains, measured as optical 
density (OD). Measurements were obtained from a single 96-well plate, with glucose-rich media and a small number of cells from each strain inserted into each 
well at time zero. The exponential growth phase is indicated (vertical dashed gray lines). (C) Same as B, but where the measurements correspond to mCherry 
intensity. Note the small variability in the intensity of mCherry, which is driven by the same control promoter across the different strains. (D) Same as C, but where 
the measurements correspond to YFP intensity. Note the large variability in the intensity of YFP, which is driven by a different promoter in each strain. (Adapted 
with permission from Zeevi et al. [201 1 ].) (£) Black line shows the scores from different participating teams plotted in descending order, and red line shows scores 
of aggregated teams starting with the score obtained from averaging the prediction results of the two best- performing teams, followed by the three best- 
performing teams, and so on until all 21 teams are included. The stand-alone dot represents the post-hoc model combining SVM and biological features. 
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promoter was measured by its promoter activity, defined as the av- 
erage YFP fluorescence during the exponential growth phase di- 
vided by the average optical density (OD) during that time period 
(see Fig. 1B,D). Hence, promoter activity represents the average 
rate of YFP production from each promoter, per cell per second, 
during the exponential phase of growth (Zeevi et al. 2011). As 
a control for the experimental error, a red fluorescent protein 
(mCherry) was driven by a control promoter, identical in all strains 
(see Fig. 1C). Several tests were performed to gauge the accuracy 
and sensitivity of the system. The results showed that growth 
curves of all strains were nearly identical, YFP levels of indepen- 
dent clones of the same promoter sequence were indistinguishable 
from those of replicate measurements of the same clone, signals 
measured in the YFP wavelength were not affected by the presence 
of the mCherry protein, and no correlation was found between the 
YFP and mCherry promoter activities across the different RP pro- 
moter strains. Finally, the average difference between any two 
mCherry strains was —5%, and when using replicate measure- 
ments, the relative error in the estimated YFP promoter activity of 
an RP promoter is —2%, indicating that it is possible to distinguish 
between any two promoters whose activities differ by as little 
as -8% (Zeevi et al. 2011). 

The challenge 

The challenge consisted of predicting the promoter activity de- 
rived from a given RP promoter sequence. Participants were 
provided with a training set of 90 natural RP promoters (see Sup- 
plemental Table SI) for which both the promoter sequence and 
activity were known and a test set of 53 promoters (see Supple- 
mental Table S2) for which only the promoter sequence was given. 
The test set was divided into two subsets. The first subset had 20 
natural RP promoters. The second subset contained 33 promoters 
that are similar to natural RP promoters but have some mutations 
in their sequence. These mutations can be separated into six types: 
mutations of TATA boxes (Basehoar et al. 2004), of binding sites 
for Fhll and Sfpl — known transcriptional regulators of RP genes 
(Badis et al. 2008; Zhu et al. 2009), mutations to nucleosome 
disfavoring sequences, random mutations that occurred un- 
intentionally while creating the natural promoters, and finally, 
sequences mutated intentionally with additional random muta- 
tions (see Table 1). The goal was to predict as accurately as possible 
the promoter activity of the 53 promoters in the test set using the 
90 promoters for training. 

Results and analysis 

The challenge was scored in four different ways using criteria that 
considered the "distance" between measured and predicted values 
or differences in rank between measured and predicted values. The 
first metric consists of a Pearson correlation between the predicted 
and measured promoter activity. The second metric is a normalized 
sum of squared differences. The third is the Spearman rank corre- 
lation, which is essentially the Pearson correlation between the 
ranks, and the fourth metric is a normalized sum of the squared 
difference in ranks. These metrics were then combined into a score 
(see Methods, Eqs. 1-5). 

As shown in Figure IE and Table 2, out of 21 participating 
teams, team FiRST was the best performer, with a score of 1.88, 
followed by team c41ab with 1.55, in a close race for the second 
place with the third team, which was then followed by a monoto- 
nous decrease in the participants' scores. When a series of aggre- 



gated teams are formed by averaging the predicted promoter 
activity values of the best N teams, the score of the aggregated best 
15 teams becomes 1.49, close to that of the second-best performing 
team (c41ab) (see Fig. IE). Scores for the remaining aggregated teams 
are also observed to be above the fourth ranked team, showing that 
blending community predictions produces robust results (see Sup- 
plemental Material, DREAM6 Participants Predictions files). 

We analyzed whether some participants were better at pre- 
dicting specific promoters but could not find any correlation be- 
tween overall team ranking and the number of promoters a team 
predicted best. Also, when predicting single promoters, the overall 
highly ranked methods did not rank first more often than lower 
ranked ones but fared well across all promoters. 

In order to investigate whether some promoters were harder 
to predict, we calculated the average distance d 2 over all partici- 
pants for promoter i from the promoter's predicted value to its 
measured value (see Eq. 6, Methods). As seen in Figure 2A, where 
promoters are ordered by increasing d 2 , five promoters out of the 
53 stand out for being predicted with less accuracy. We next di- 
vided the promoters based on d 2 into two groups consisting of the 
best 30 predictions (green dots, Fig. 2A) and the 23 worst pre- 
dictions (red dots, Fig. 2A) and plotted the Pearson correlation of 
each of the participating teams for these two groups of promoters 
(Fig. 2B). For all teams, the Pearson correlation clearly separated 
the best-predicted and worst-predicted promoters as defined by d 2 
showing that, for all participants, promoters could be consistently 
divided into two groups, one of which was harder to predict than 
the other. 

To identify the source of the difficulty in predicting the ex- 
pression values of these 23 promoters, we explored the possibility 
of this list being enriched for mutant promoters. Wild-type pro- 
moters were found to be distributed equally between the worst- 
predicted promoters (10 empty dots on red side of Fig. 2 A) and 
best-predicted promoters (10 empty dots on green side of Fig. 2A). 
A Fisher test shows no statistical significance for mutant or wild- 
type promoter enrichment. We next used measure Xi ( see Eq. 7, 
Methods) to evaluate whether promoter activity was correlated to 
the difficulty of predicting its value. Figure 2C, showing how Xi 
varies for each promoter, reflects that participants' performance is 
anti-correlated with promoter activity, with a Pearson correlation 
of -0.836. Participants' prediction accuracy can be divided into 
three groups according to their promoter activity ft : ft values be- 
tween 1 and 3 (< Xi > = 0.25 ± 0.73 for i such that 1 > ft > 3, 18 
promoters) — which fared significantly better than the following 
two groups: ft values less than 1 ( <Xi > = 3.02 ± 1.10 for i such that 
ft < 1, 8 promoters, t-testp< 1.1 X 10 -11 ); and ft values higher than 
3 (<Xi> = -1.48 ± 0.51 for i such that ft > 3, 7 promoters, t-test 
p < 1.75 X 10~ 7 ). Both observations are independent of whether 
the promoters contain mutations (Fig. 2C, full and empty dots). 

As we could not find clear differences between mutant and 
wild-type promoters when using the d 2 measure, we calculated 
a different type of distance d 1 to compare participant predictions 
and measurements (see Eq. 8, Methods). As shown in Figure 3 A, d] 
clearly distinguishes wild-type promoters (mean value of d] is 
1.62 ± 0.22) from mutant promoters (mean value of d] is 2.23 ± 
0.41, t-test P < 8 X 10" 8 ). In order to understand the differences in d 1 
for the various mutant promoters, we formed six groups according 
to the nature of their mutations. In Figure 3B, the different groups of 
mutations were ordered according to the associated d 1 mean value. 
Participants' predictions fared better for mutations typically in- 
ducing small changes in promoter expression (low d 1 in Fig. 3B), 
such as random mutations. Conversely, sequence mutations known 
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Table 1. Information on the promoter sequence mutations 
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For every promoter, locations of TATA boxes (pink circles), and of binding sites for Rapl (red), Fhll (green), and Sfpl (blue) are shown. In addition, shown is the 
per-base pair nucleosome occupancy of every promoter (occupancy is shown in a white to black scale, with white corresponding to no occupancy and black to 
full occupancy), predicted using a computational model of nucleosome sequence preferences (Kaplan et al. 2009). Also shown is a matrix (left) summary of the 
number of factor sites that appear in every RP promoter (counts for Rapl are only shown for the 400 bp upstream of the TrSS; for Fhll and Sfpl , 300 bp; and for 
TATA, 200 bp), along with a column representing whether the corresponding RP gene exists in a single copy in the yeast genome (first column, black) and 
whether it is an essential gene (second column, gray). The length of each native promoter is indicated (cyan vertical line) if it is shorter than 600 bp. 



to induce large changes by lowering promoter expression, such as 
mutations to the TATA box, were the worst-predicted group (high d] 
in Fig. 3B). As there is not enough data to extract a statistical measure 
of the differences between groups of promoters, we decided to follow 
up on the previous observation and compare the d] value for each 
mutant promoter to the relative promoter activity difference induced 
by the mutations. As shown in Figure 3C, d) grows exponentially 
with increasing differences between wild-type and mutant promoter 
expression. Hence, prediction accuracy for mutant promoters wors- 
ened when mutations induced higher changes on expression. 

Improving promoter expression prediction 
by adding biological features 

As shown in Figure IB, scores of aggregated teams were observed to 
be robustly above the fourth-ranked team but did not fare better 



than the three best-performing teams. As the best-performing 
models of this challenge did not include biological features such as 
the binding sites for Fhll and Sfpl, known transcriptional regu- 
lators of RP transcription factors, we decided to try to improve 
model performance by including biological features in the best- 
performer algorithm of team FiRST. To do this, we modified a re- 
cently published mechanistically motivated model that takes into 
account the competition between transcription factors and nu- 
cleosomes for DNA binding sites in the regulation of gene ex- 
pression (Zeevi et al. 2011) (Eqs. 9 and 10; see Methods). The score 
for this model based on C p , the Pearson correlation between pre- 
dicted and observed activity, was 0.49 (see Eq. 1, Methods). We 
then combined this model with that of the best-performing team, 
FiRST, in two ways. In the first approach, we averaged the predicted 
activity of each promoter by team FiRST and the mechanistic 
model. The correlation between the predicted and actual activities 
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Table 2. Scores from different teams ranked in descending order 
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Only names of the two best-performing teams are indicated. Cp (see Eq. 1) indicates the Pearson metric, X2 the score based on the x 2 metric (see Eq. 2), 
Sp the score based on the Spearman metric (see Eq. 3), and R2 the score based on the rank 2 metric (see Eq. 4). pi, p2, p3, and p4 are the associated 
P-values based on the null-hypothesis generated from randomized values for the distances Cp, X2, Sp, and R2. Note that P-values become significant 
across the table if a less stringent null-hypothesis is applied. The last column is the final score calculated as the P-value product: -^log n/=i P y (see Eq. 5). 



remained the same as for FiRST (—0.65) (see Table 1), demonstrating 
the robustness of aggregating predictions even when one method 
has considerably lower performance. Given that the method by team 
FiRST did not explicitly use transcription factor binding, we reasoned 
that incorporating the transcription factor binding site information 
directly into team FiRST's model should be complementary to the 
method and could reveal interactions between transcription factors 
and sequence context. To test this idea, we included the transcrip- 
tion factor binding affinities for each promoter as additional features 
to those used by team FiRST (see Supplemental Table S3 for details 
on the features). We then trained a support vector machine (SVM) 
using the combined features from both models. The resulting model 
provided predictions that had a correlation of 0.67 to the actual 
promoter activity and a combined score of 2.6 (C p = 0.67289; X2 = 
39.79601; S p = 0.66815; R2 = 30.75429) (see Fig. IE and Supple- 
mental Data, DREAM6 Participants Predictions files), presenting 
a significant improvement and best performance compared to all 
the other teams or the aggregate of their predictions. 

Discussion 

The scoring and analysis of submitted predictions for the DREAM6 
Gene Promoter Expression Prediction challenge revealed excellent 
performances (see Fig. IE and Table 2). This is, indeed, remarkable, as 
the data set presented a difficult learning problem due to the high 
homology between the promoters in the relatively small RP promoter 
training set — yeast only has 137 ribosomal promoters — and lower 
dynamic range of promoter activity compared to what would be 
observed on a genome-wide scale. Methods with typically high 
accuracy in genome-wide predictions ranked 11 and 12 here (see 
Supplemental Table S4), indicating that the challenge posed by RP 
promoters is distinct and requires the development of specific 
methods in order to be solved. 

Choosing the right scoring scheme to evaluate the challenge 
was essential, as participants fared differently depending on the 



metric used (see Table 2). The best-performing team did not get the 
top score for all metrics nor all promoters but was the most con- 
sistent. Also, participants had difficulties while predicting low- 
expressed promoters and certain mutant RP promoters. Finally, 
community predictions were robust to the aggregation of teams' 
results, and the best score of 2.6 was obtained by combining fea- 
tures from team FiRST's machine-learning model and a mechanis- 
tic model based on biophysical assumptions. 

During their presentations at the DREAM6 conference, the 
best-performing teams, FiRST and c41ab, showed that mutated 
promoters were harder to predict than natural promoters. Team 
FiRST mainly used the first 100 bp of the promoter to predict pro- 
moter activity, and team c41ab used a 12-mer motif. Team FiRST 
tried to include features such as k-mer counts (mono, di, tri, tetra, 
and penta), homopolymer stretches, promoter length, DNA bend- 
ability, DNA protein deformability, DNA bending stiffness, and 
nucleosome binding potential. They used a machine-learning SVM 
approach to select 12 features that can be summarized as follows: 
one mononucleotide G, one dinucleotide GT, six trinucleotides, 12 
tetranucleotides, length of T-tracts and TA-tracts, DNA deform- 
ability (a detailed description of this model will appear in a different 
manuscript). Team c41ab also used different k-mer counts to finally 
concentrate on 12-mer motifs used in a support vector regression 
approach but did not find any correlation between the 12-mers and 
biological features such as distance to a TrSS or copy number motifs 
(see Supplemental Table S4 for a brief description of other partici- 
pants' methods). 

Neither of the best-performing teams directly used general 
features related to transcription factors such as TATA boxes and 
nucleosome occluding sequences. Actually, none of the four bi- 
ological features targeted by the mutations — TATA boxes, binding 
sites for the transcriptional regulators Fhll and Sfpl, and muta- 
tions to nucleosome disfavoring sequences — were detected by the 
participants. Since most participants did not include these features 
in their models, it is not surprising that many fared worse with 
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Figure 2. Analysis of promoter prediction results. (A) Promoters are ordered by increasing d\- s f < \ 2 "' > P, where X ip is the predicted value of 
promoter /and participant p= 1,2. . .21 , and ^ is the measured value for promoter /= 1,2. . .53. Green dots represent the 30 best predictions, and red dots 
the 23 worst predictions. Empty dots represent the 20 wild-type promoters; full dots represent the 33 mutated promoters. (B) The Pearson correlation of 
each of the participating teams is shown in green dots for the best predictions and in red dots for the worst predictions as defined in A. Teams are ordered 
by rank based on their final score. (C) For each promoter, Xi is plotted in logarithmic scale against the promoter activity value. Empty dots represent wild- 
type promoters and full dots mutant promoters. 



(Xi P -6) 



promoters where these sequences were mutated. Figure 3 ; B and C 
show precisely that, as mutation-induced expression changes in- 
crease, predictions become worse. One exception is team FiRST's 
machine-learning method that was able to identify a number of 
nucleosome disfavoring features, in particular TA-tracts, as being 
useful in predicting promoter activity. 

During the DREAM6 conference discussion, an audience 
member proposed that the training set should have included mu- 
tated promoter sequences. However, an intended feature of the 
challenge was to indicate that mutated sequences were present in 
the test set without giving hints or providing training data on se- 



quence changes that could affect the promoter expression level. 
We expected participants to analyze the origin of these mutations 
and think that our strategy was correct, as Figure 2A shows that, 
although participants did not look for the origin of mutated pro- 
moters, these were distributed equally between the groups of best- 
and worst-predicted promoters. It is only when all mutated and 
wild-type promoters are separated into two groups that participants' 
predictions for those two groups can be differentiated (Fig. 3 A). 

The mechanism by which Fhll, Sfpl, Rapl, and TATA boxes 
contribute to the promoter expression appear to follow a simple 
rule, where more sites from these factors in closer proximity to the 



Genome Research 1933 



www.genome.org 



Meyer et al. 



A 3 



E I 

0 e 

S >< 

c 

o 

1 • 

E 



0.5 

0 

3.5 



OWildtype 
• Mutant 



Promoters (alphabetical order) 



B 
a 

£ 



L0 



II 





• 




• 
• 


• # • 

1 1 - x 


• 


• 


• • • 




• 


• 

• 




Afhll 


Random ANucDisf Asfpl Addition 


-Atata 




Types of mutations 
















0.2 


Q.4' ' f 0.6 0.8 


1 




• 





Percentage change in expression Mutant vs WT 

Figure 3. Analysis of prediction results for mutated promoters. (A) 
Promoters were divided into two groups depending on whether they were 
wild type (empty dots) or contained mutations (full dots) and plotted according 
to d\- < Xi p~& >p/ where X ip is the predicted value of promoter / and 

participant p = 1 ,2. . .21 , and & is the measured value for promoter / = 
1,2... 53. (B) Mutant promoter expression values were grouped 
according to the nature of the mutation and ordered by mean d 1 value 
for each group. The six groups consist of mutations of TATA boxes 
(Atata), of binding sites for Fhll (Afhll ) and Sfpl (Asfpl ), mutations to 
nucleosome disfavoring sequences (ANucDisf), random mutations (Ran- 
dom), and finally, sequences mutated intentionally with additional ran- 
dom mutations (Addition). The d l value for each promoter is indicated 
by full dots; the mean value of d l for each of the six grouped mutations is 
indicated by a thick bar. (C) For each mutated promoter /', d\ is plotted as 
a function of the percentage of expression value change induced in the 
wild-type promoter by the mutation. The vertical scale is logarithmic. 



TrSS result in higher promoter activity (Zeevi et al. 2011). The 
contribution of one of these factors to the overall promoter activity 
depends on the specific organization of its sites within the pro- 
moter (Lieb et al. 2001; Wade et al. 2004; Sharon et al. 2012). As 
shown in Figure 2C, participants had difficulties predicting low- 
and high-expressed promoters. The thresholds for low/high pro- 
moter activity are sharply defined and define values lower than 
1.5 and higher than 3, respectively. Seven of the eight promoters 



whose activity is higher than 3 are mutated promoters, shown to 
be difficult to predict. Low-activity promoters are RPL41B_Mutl, 
RPL15A_Mutl, RPL21B, RPL4A_Mut6, RPL11A, RPL35B, RPL39_ 
Mutl, and RPS14B_Mutl. As the experimental setup can distinguish 
promoter activities separated by less than 8%, we do not think that the 
difficulties with predicting low promoters arise from experimental 
limitations while measuring lower signals. Instead, as shown in Table 
1, promoters RPL41B_Mutl, RPL21B, RPL11A, RPL35B, RPL39_Mutl, 
and RPS14B_Mutl have dispersed or lack binding motifs (see also 
Supplemental Table S5). The other mutations present in promoters of 
low activity are RPL4A_Mut6 and RPL15A_Mutl, which cause an 
—70% decrease in promoter activity and as discussed, participants 
had difficulties predicting strong mutation effects. We conclude that 
the difficulty participants had while predicting low-expressed pro- 
moters is, indeed, due to less information available in these promoter 
sequences and a less coherent organization of the different sequence 
features, with very few TATA boxes, Fhll, Rapl, and Sfpl sites. 

Finally, the improvement of the best-performing model, by 
mixing a biology-based mechanistic approach and machine- 
learning techniques, implies that the wisdom of crowds could be 
tapped further by methods that directly incorporate distinct fea- 
tures from independent models. Simple aggregation might miss 
the interactions between the different features in the models se- 
lected. Estimating the relative contributions of features extracted 
from each model could be approached as a learning problem where 
the different models are reduced to being independent tools for 
feature selection. Once the relevant features are selected, they are 
integrated into a new model, and adequate parameters are learned 
once again. Overall, we think this study not only provides 
a benchmark for the assessment of methods predicting promoter 
activity from sequence, but it also shows that understanding the 
basis of fine-tuned regulation of highly homologous promoters 
could provide clues for engineering promoter libraries to obtain 
a desired promoter strength from a parent promoter sequence. 

Methods 

Constructing promoter strains 

A construct of ADH1 terminator-mCherry-T£F2 promoter-YFP- 
ADH1 terminator-AL4T2 was inserted into the SGA-compatible 
strain Y8205 at the his3 deletion location (the construct replaced 
chromosome 15, at base pairs 721987-722506). The resulting 
strain served as a master strain for the entire library. Desired pro- 
moters were lifted by PCR from the BY4741 yeast strain. Primers 
contained one part matching the ends of the lifted promoters, and 
a constant part at their 5 ' end matching the first 25 bases of the YFP 
gene (for reverse primers) or a linker sequence (for forward primers; 
see all primer sequences in Zeevi et al. 2011). Each promoter was 
linked to a URA3 selection marker (Linshiz et al. 2008) and then 
amplified such that its genomic integration sites increased to 45/ 
50 bp. Integration into the genome was performed by homologous 
recombination as described in Gietz and Schiestl (2007). All steps 
were performed on 96-well plates, except for growing the final 
clones, which was performed on six-well plates (2% agar, SCD- 
URA). To validate the inserted promoter sequences, the insertions 
were lifted from each target strain by PCR and sequenced. 

Constructing promoter strains with targeted mutations 

To create a mutated promoter, we amplified it in two parts which 
flank the desired mutation area. The left part was amplified using 
a reverse primer with a 3 5 -bp tail at its 5' end that contains the 
desired mutation, while the right part was amplified using a for- 
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ward primer that also had a similar tail. The two new parts, both 
containing the desired mutation in an overlapping region of 35 bp, 
were then connected, similar to the way in which we connected 
promoters to the URA3 selection marker. See Table 1 and Supple- 
mental Table S6 for more information. 



Library measurements 

Cells were inoculated from stocks kept at -80°C into SCD (180 pX, 
96-well plate) and left to grow at 30°C for 48 h, reaching complete 
saturation. Next, 8 jjlL were passed into fresh medium (180 juuL) 
according to the desired condition (e.g., SCD, ethanol, heat shock). 
Measurements were carried out every —20 min using a robotic 
system (Tecan Freedom EVO) with a plate reader (Tecan Infinite 
F500). Each measurement included optical density (filter wave- 
lengths 600 nm, bandwidth 10 nm), YFP fluorescence (excitation 
500 nm, emission 540 nm, bandwidths 25/25 nm, accordingly), 
and mCherry fluorescence (excitation 570 nm, emission 630 nm, 
bandwidths 25/35 nm, accordingly). Measurements were carried 
out using a total of eight different conditions. In all experiments, 
yeast cells were grown on SC (6.9 g/L YNB, 1.6 g/L amino acids 
complete). Four conditions used different 2% sugar growth media: 
SC-glucose, SC-galactose, SC-ethanol, and SC-glycerol. The other 
four conditions used SC-glucose with an additional stress factor: 
Rapamycin (40 |xg/mL), amino acid starvation (no amino acids 
except histidine and leucine), heat shock (39°C), and osmotic 
stress (750 mM KC1). Every strain was measured in three biological 
replicates for each condition. Most of the data analysis was per- 
formed on data from growth on SC-glucose (without stress), which 
was measured in five replicates. 



predicted values for a given promoter, and also for each partici- 
pant, and that P- value was denoted as p 2 . 

We also defined the score by comparing the rank of predicted 
values to the actual rank of measured values. Let us denote by R ip 
the predicted rank of promoter i for participant p, 1< R ip <53 and pi 
the rank of the measured promoter i = 1, 2 . . 53 andp = 1,2,. . .,P. 
Then, the score based on a Spearman metric for participant p is 
defined by 

1 V N R £ _ i v N r 1 V N £ 
£ N La=\ ^'P • gi N LA=\ ^P' N LA=\ gi 

\l h Xi=l ( Ri P _ N Zz=l R ip) 2 -^ Si=l (ft ~~ N £i=l £i) 2 

A null prediction was created by randomly permuting par- 
ticipants' predicted values for a given promoter and then ranking 
a given "random" participant i to obtain the R ip ranks across the 53 
different rankings of promoters, thus generating a distribution of 
distances between measured and estimated values, for which a 
P-value denoted as p 3 can be estimated for S p . The score based on 
a rank 2 metric for participant p is defined by 

P ^ It: (ftp -ft) 2 ( ) 

where ft p is the rank of proximity of X ip to ft, 1 < ft p < P, and p t the 
rank of the measured promoter i = 1, 2 . . ., 53. The null hypothesis 
was derived from the random permutation of participants' pre- 
dicted values for a given promoter and then ranking a given 
"random" participant. The derived P-value is denoted as p 4 . The 
overall score was defined as a function of the product of all the 
P-values defined as 



Scoring 

The challenge was scored in four different ways using criteria based 
on the "distance" between measured and predicted values or dif- 
ferences in rank between measured and predicted values. As we 
requested predictions of the expression levels from N = 53 pro- 
moter sequences, let us denote by X ip the predicted activity of 
promoter i for participant p, and ft the measured activity of pro- 
moter i = 1, 2 . . ., 53 andp = 1,2. . ..,P, where P = 21 is the number of 
teams that participated in the challenge. The score based on 
a Pearson metric for participant p is defined by 



<X ip .ft> - <X ip xft > 



vHp^ 2 



(i) 



In order to calculate for each participant the probability of 
getting by chance a score at least as good, we randomly sampled 
the predictions across the entire set of participants. For each pro- 
moter i = 1,2. . .53, we chose at random one of the X ip predic- 
tions, where p = 1,2,. . .,P. We thus obtained a value of C p which 
corresponded to one possible random choice of predictions among 
all the participants. By repeating the same process 100,000 times, we 
generated a null distribution of distances between measured and 
estimated values, from which a P-value can be estimated for C p . For 
each participant, that P-value was denoted as p x . 

The score based on the x 2 metric for participant p is defined by 



(x lp -g,r 



N 



(2) 



The null hypothesis was generated in a similar way by gen- 
erating P-values resulting from the permutation of participants' 



Score -- 



1 4 



(5) 



Prediction distances to promoter values 

The average distance d 2 : over all participants p for promoter i from 
the promoter predicted value (X ip ) to the promoter measured value 
(£i) is defined as 



< >p - 



(6) 



We also considered whether promoter activity was correlated 
to the difficulty to predict its value and used the following measure 
Xi defined by 



Xi = 



<X ip > p -ft 

yj (X ip - <X ip > p ) 2 



(7) 



We finally calculated a different type of distance d] to 
compare participant predictions and measurements, defined such 
that 



Xi P - ft 



(8) 



Combined model 

We considered binding sites for three transcription factors — Rapl 
(Wade et al. 2004), Fhll (Harbison et al. 2004; Schawalder et al. 
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2004; Wade et al. 2004), Sfpl (Badis et al. 2008; Zhu et al. 
2009) — that have been shown to influence yeast ribosomal gene 
expression. Our model considered promoter activity X p as directly 
proportional to the binding likelihood of each of the three tran- 
scription factors to their cognate motifs, above a specific thresh- 
old, relative to the nucleosome binding potential of the same 
sites: 

Pit) 

X p = l+ E ^w t P(t^b\S[i]) (9) 

tTFs i=l 

where P(t) is the set of all potential binding sites for transcription 
factor t above a certain threshold, w t is a coefficient measuring the 
relative contribution of factor t to the promoter activity de- 
termined using MATLAB's nonlinear solver, and P(t = b\S[i\) is the 
probability that transcription factor t binds its potential site at 
position i in promoter sequence S. To determine the binding sites 
for the three transcription factors, we used their sequence speci- 
ficities documented in position weight matrices (PWMs) (Basehoar 
et al. 2004; Badis et al. 2008; Zhu et al. 2009). In estimating the 
binding threshold for each transcription factor, we explored 
the correlation between promoter activity and sites above each 
possible threshold at intervals of 0.1. For each transcription 
factor, we considered potential binding sites as those with an 
affinity above the threshold and located within known spatial 
localization sites: for Rapl, 400 bp upstream of the TrSS; for Fhll 
and Sfpl, 300 bp upstream of the TrSS (Zeevi et al. 2011). We 
then modeled the probability for transcription factor binding as 
the weight of the configuration in which the factor is bound 
divided by the sum of the weight of that configuration, the 
weight of the configuration in which the DNA is unbound, and 
the weight of the configuration in which a nucleosome is bound 
to the site: 

(10) 

where 1 represents the DNA unbound configuration, A t S[i] repre- 
sents the affinity of transcription factor t for the binding site at 
position i in promoter S, and A nuc 5[z] is the affinity of nucleosomes 
for position i in promoter S. 

For i4 nuc 5[i] ; we used a sequence-based nucleosome affinity 
model to compute the average nucleosome occupancy (Kaplan 
etal. 2009). 

We applied w t coefficients obtained from a nonlinear solver 
trained on 90 promoters to predict promoter activities of a held-out 
set of 53 promoters used in the DREAM challenge. 
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